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Abstract 


This paper discusses the research on the allocation plan between experts and works in large-scale innovation competitions, 
establishes 0-1 planning model, Z-score evaluation model, PCA evaluation model, fuzzy evaluation model, "range" model, 
and difference perception evaluation model. It mainly uses linear programming algorithm and PCCs correlation coefficient 
to obtain the optimal plan for the allocation of experts and works. We use the method of linear programming to establish a 
0-1 planning model with the goal function of maximizing the coverage rate of review experts to participating teams and the 
number of cross reviews, and sets the constraint condition that each work can be reviewed by 5 different review experts, 
each expert can review 120 different works, all variables are non-negative integers. An optimal solution is obtained, and 
the average distance between the review experts and works is calculated as 1439.29893, at this time, the comparability of 
the review plan is strong. Through analysing the relationship between the original scores of experts and awards for works, 
the three evaluation models are redesigned: Z-score evaluation scheme, PCA evaluation scheme, fuzzy evaluation scheme. 
The correlation coefficients between the three plans and the original plan are calculated as 0.94, 0.86, and 0.89 respectively. 
Therefore, it can be seen that the Z-score evaluation scheme is better than the other two schemes. We have designed a new 
standard score calculation formula to better reflect the level of an article in the overall work, to calculate the ranking of 
scores, and analyzed it with the actual ranking of award-winning papers agreed upon by multiple experts. The correlation is 
96.32%, indicating that the new standard score formula is reliable. By calculating and analysing the average score and range 
of two stages of reviews, it is found that the average score and range of the first stage are significantly higher than those of 
the second stage. The reason may be that the experts in the second stage require stricter standards, and their evaluations 
are more uniform, which can more accurately evaluate works and eliminate differences in scores among different experts. 
Therefore, it is more reasonable to adopt two stages for review. For those works with excessively large range outside the low 
range, this paper establishes a "range" model so that these works can be processed programmatically. Through this model, it 
is concluded that works with large range values have a connection with awards. Based on the method of differential decision- 
making, we established a complete innovation competition evaluation model - difference perception evaluation model. This 
model innovatively integrates the characteristics of multiple reviews and the differences between expert judges, can quickly 
output the final results, and calculates MSE as 0.24 and R? as 0.99 by evaluating and testing the model. The results show that 
the model can effectively improve the quality and efficiency of the review. At the same time, it makes a preliminary comparison 
and analysis of current review methods and provides suggestions and ideas for future review methods and processes. 
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Abbreviations: PCA: Principal Component Analysis; 
AHP: Analytic Hierarchy Process; MSE: Mean Square Error; 
Z-score: Standard Score; PC: Principal Component. 


Introduction 


Innovation competitions can stimulate students’ 
team spirit, problem-solving skills, and scientific thinking. 
However, in the judging process, there are many challenges 
such as the large number of works, the uneven quality, and so 
on. The significance of the research on the evaluation scheme 
of the innovation competition [1] First, it provides innovative 
ideas and methods for evaluation experts to minimize the 
scoring error. Secondly, it can provide the organizer with 
new evaluation standards, provide fair and transparent 
evaluation results, and enhance the credibility of the event. 
Provide new data and models for research related to 
innovation competitions [2], and promote the development 
of innovation competitions. 


To address challenges of large-scale innovation 
competitions, researchers have proposed various evaluation 
strategies. In the early stages, subjective judgment methods 
were mainly used. This approach saves time during the 
evaluation process but may be influenced by individual 
preferences. On this basis, standardized evaluation rules 
were introduced to reduce some subjectivity. Currently, 
competitions are optimizing their own evaluation plans and 
attempting to use various integration strategies to improve 
evaluation quality. Xu Z, et al. [3] used multi-attribute 
decision-making methods to consider factors such as expert 
reviewer numbers and submission quality, weighting each 
factor to enhance fairness in the evaluation. Zhuang S, et 
al. [4] employed hierarchical analysis (AHP) [5] and fuzzy 
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evaluation methods to prove that increasing the number 
of reviewers and submissions has a positive impact on 
evaluation results. Wang F, et al. [6] proposed a data-driven 
evaluation method [7] using data analysis to assist decision- 
making. Although these current evaluation methods have 
made progress, they still face significant challenges in 
optimizing the evaluation plan for large-scale innovation 
competitions [8]. 


The purpose of the evaluation scheme of the large- 
scale innovation competition is to evaluate the quality and 
innovation of the works of different reviewers fairly and 
scientifically [9]. To achieve these goals, the following models 
are established: the 0-1 plan model, the comprehensive 
review model, the “scope” model [10-12], and the difference 
perception evaluation model. 


Methods 


We have employed four methods to address different 
issues in evaluating large-scale innovative competitions, and 
method two contains three sub-methods [13]. The following 
is the model analysis corresponding to each method. 


0.1 Programming Model Analysis 


We can first set an optimization goal for the evaluation 
plan, such as minimizing the scoring differences among 
evaluation experts or maximizing the intersection of 
evaluations by experts [14]. Next, we can use linear [15-17] 
programming algorithms to find the optimal or approximate 
optimal evaluation plan in the feasible solution space. Finally, 
we can design relevant indicators. 


rc 


1. Set optimization goals - 


Programming Model «~~ 3, Use linear programming 


4. Design relevant indicators « 


XN 


Minimize scoring differences 


Maximize the intersection of works 
; 2. Determine feasible solution space 
Find the optimal review plan 


Review overlap rate 

Review coverage 

Evaluation balance 

Evaluate the merits and demerits of the scheme 


Figure 1: 0-1 Programming Model Mind Map. 


Determine scheme according to the number of judges 


J 


Comprehensive Review Model will affect the optimization 
of 0-1 Programming Model’s “cross distribution” plan. We 
can select two or more evaluation plans, such as Z-score 
evaluation [18], PCA evaluation [19,20], and fuzzy evaluation 
[21-23] and analyse the distribution characteristics of 
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each expert’s original scores, adjusted scores, and the 
characteristics of work rankings under different plans using 
the data in the attachment. Furthermore, we can compare 
these plans based on the analysis results and design a new 
standard score calculation model [24]. 
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PCA evaluation 


1, Select a review plan € 


2. Analyze the distribution characteristics of expert scores and work scores 


{ Comprehensive Review Model 


3. Compare and evaluate the pros and cons of the scheme <e 


4. Design a new standard score calculation model < 


5. Improve the standard score calculation model — Model improvement 


Figure 2: Comprehensive Review Model Mind Map. 


Z score evaluation scheme 


Fuzzy comprehensive evaluation scheme 


scheme 


Experts, original results of the work 
The distribution characteristics after adjustment 
Mean, variance, range, skewness, etc 


Case onking ¢ 


Compare and evaluate the pros and cons of the scheme 


consistency 
stability 
sensibility 


Determine the best review plan 


Introducing correction factor, weight, fuzzy function and so on 
Improve the effectiveness of evaluation schemes 


J 


“Range” Model analysis: In addressing “Range” Model, it 
is important to note that while range is an important indicator 


to measure rating differences and innovation, it cannot solely 
determine the quality of works based on its magnitude [25]. 


Ga i) 
Consider poor ratings and other relevant factors 
1. Establishment of evaluation model { The level and distribution of scores were considered comprehensively 
Pay attention to expert grading tendency and evaluation criteria 
Consider the relationship between 
“Range” Model ; ; 
2. Connection between Model | and Model II & extreme and innovation 
Differences in grading levels and standards 
, ee Further improve the quality of evaluation 
3. The practical application value of Model 3 a ae ; 
Classification and adjustment of works 
Figure 3: “Range” Model Mind Map. 
XR 4 
Difference Perception Review Model Analysis: consideration of the completeness and systematicity of 


Addressing Difference Perception Review Model requires 


innovative competition evaluations [26]. 


(Difference Perception Review Model 


XN 


1. Find the right calculation method < 
2. Establish a complete review model £ 


3. Analyze the existing model < 


4. Make suggestions for the future review process < 


Figure 4: Difference Perception Review Model Mind Map. 


Combined with comprehensive review model 
The Z-score standard scoring method is proposed 
Based on the differentiation of the judges 

The existing model is analyzed 

Build a difference perception model 


The analysis is based on the current situation 

Make suggestions with high feasibility 

Improve review efficiency 
Improve fairness and scientificity 


Model Assumptions 


Based on the issues raised in this paper and the analysis 
above, we have made the following model assumptions: 


1. It is assumed that the data provided are reliable and 
accurate 
2. It is assumed that the evaluation levels of the experts 
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involved are comparable 


3. Itis assumed that the experts are evaluating in the same 
environment without any external factors interfering 
with their evaluation 

4. It is assumed that the experts’ evaluations are relatively 


objective, unaffected by other factors, and only related to 
the quality and innovation of the work 
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table, and other symbols are described in each chapter of the 
paper. 


Symbol Description 


The commonly used symbols are shown in the following 


Parameter symbols Parameter Meaning 


Z.. Standardized scoring 


The standard deviation between the original score and the standard score in the first evaluation 


I Indicator function 


Original rating matrix 


PCA principal component analysis 

a Standard deviation of the first reviewer 

Oo Rating weight of evaluation experts 
L(x) Membership degree 

A fuzzy set 

B Comprehensive fuzzy set 

R Extremely poor performance of the j-th work 

iy Comprehensive rating of the first work 

B Ranking of the j-th work 

5 Average score of the first evaluation expert 

a Extreme threshold 

B Threshold for comprehensive evaluation 
Score Comprehensive standard score 

R* Adjusted range 

qT" Adjusted comprehensive score 

i 

y Adjusted proportion 
MSE Mean square error 

R? Coefficient of determination 

r Pearson correlation coefficient 


Table 1: Symbol Description Table. 


judges for each work [27-30]. A 0-1 programming model is 
established to optimize the intersections of works among 
judges [31,32]. This subsection mainly focuses on the 
establishment, solving, and analysis of the 0-1 programming 
model. 


Experiments 


In this section, we model and solve the four methods 
through experiments, which are divided into four subsections. 


0-1 Programming Model 
Establishment of the 0-1 Programming Model: The 


We should firstly deal with the number of intersections 
of works and the comparability of scores under the fixed 
number of participating teams, judges, and the number of 
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objective of this model is to optimize the intersections of 
works among judges, that is, to make the intersections of 
works as large as possible and to increase the comparability 


Copyright© Bin Z, et al. 


of scores. The constraints are: 


Each work is judged by exactly judges: 


125 
n=l ge = D # (1) 
Where = 1, 2,...,3000. 
Each judge must judge exactly C works: 
3000 
Dae x n = CH (2) 


Mm. 
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Where = 1, 2,...,125. 
If the set of works judged by the nth and the q-th judges have 
an intersection, then Vigo otherwise Yig7 0) that is 


Yug = MAX {Xp X pq im =1,2,...,3000} (3) 


Where n,g = 1, 2, ...,125, n#q. 


Solution and Analysis of the Model 


Design of Calculation Methods and Determination of 


Parameters 

e Design of Calculation Methods: 

e Review Coverage Rate: The proportion of experts who 
have been reviewed for each work, that is, the higher this 
indicator, the more review opinions each work receives, 
and the higher the fairness. 


D 
— x 

125 
e Review Overlap Rate: The ratio of the intersection 
between the works reviewed by each expert and the 
works reviewed by other experts, that is, the higher this 


indicator, the more intersections between the reviewed 
works and the higher the comparability of ratings. 


Z 
—__—___ x# 
125x125+2 


e Evaluation Balance: The balance between the difficulty 
and innovation of the works reviewed by each expert, i.e. 


# (4) 


(S) 


1 125 | 1 ~3000 = 
=] ‘D3 =] Caen ~ sy’ # (6) 
Boo 
Where it refers to the score of difficulty and innovation 
for the m-th work, and the average score of difficulty and 
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Figure 5: 0-1 Process of planning a model. 
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index meter 
Calculate 


of indicators 


| 
| 
| 
| Comparison 
| 
| 
| 


innovation for all works. 


SSx# (7) 
Modeling this problem as a combinatorial optimization 
problem [33,34], using the linear programming algorithm to 
solve the 0-1 integer programming problem, the steps are as 
follows: 
e Determination of key variables: Define a binary 
variable that represents the review expert number and 
the work number. 


x(n,m) nmnmx(n,m)nmx # (8) 


e Add Constraint Conditions: Limit the maximum 
number of works that can be reviewed by each review 
expert: for all of these constraints, ensure that each 
review expert does not exceed the maximum number of 
works. 


e> mx x(n,m) <enx# (9) 
e Each work needs to be reviewed by a peer reviewer: 


The following constraints can be used to ensure that all 
works are reviewed by peer reviewers. 


f > nxx(n,m) = finf x # (10) 


Copyright© Bin Z, et al. 


Open Access Journal of Data Science & Artificial Intelligence 


e Parameter Determination: 

x, asa 0-1 variable, it represents the status of the m-th work 
being reviewed by the nth expert, where m = 1,2, 3000, n = 
1, 2,125. 

2 an as a 0-1 variable, represents the intersection state of the 
collection of works reviewed by the nth and q-th experts, 
where n, q = 1,2,125,n#q. 

Z as an integer variable represents the total number of 
intersections of all expert reviewed work collections, i.e. 


Z = Yen Yog * # (11) 


C as a constant, it represents the number of works reviewed 
by each expert, i.e. 


3000 x5 
C =~ =120x# (12) 
125 
Das aconstant, it represents the number of experts evaluated 
for each work, i.e. D=5 


Parameter Symbol : 
Symbols Category Parameter Meaning Value Range 
Xan 0-1 variable Has the m-th work been reviewed by the nth expert ae oe 
: Is there an intersection between the collection of works reviewed N, q=1, 2, ..., 125; N 
Y 0-1 variable 
nq by the nth and q-th experts #q 
7 Integer The total number of intersections between the collection of works FP = > y 
variable reviewed by all experts n<m "4 
. 3000x 5 
C constant Number of works reviewed by each expert cS 5 =120 
D constant Number of experts reviewed for each work D=5 


Table 2: 0-1 Programming Model Parameter Table. 


Result Analysis 


Cross-Distribution Result 


os 


os 


oO 


o2 


oo 


Figure 6: Calculation Results. 
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Based on the analysis of the results, the model has 
successfully achieved an optimal “cross-distribution” 
scheme. The average distance between the evaluation 
experts and the work is 1439.29893. We can find that each 
reviewer reviewed 120 works, each of which was reviewed 
by 5 experts, and there were no duplicate reviews. 


Comprehensive Review Model 


To solve the problem that the assumption of the 
standard score evaluation scheme may not hold, an improved 
comprehensive evaluation model was established [35,36]. 


Data Preprocessing: This section performed corresponding 
processing on data 1. 


Analysis of different Get more 


i | 
| | 1. Abnormal Delete 
' components accurate ! 
| 
| 


| value reconsideration 
processing Team Results 


The relationship The exact result 


No second culli Re 
Review team resu 


> 
83 
q 


Convenient including final score, ranking, awards, No. 


The original score and standard 
score of a review and 

Original score and standard 

score of the second review 


Reorder and 


model input 


rename into the 
‘ data of 
Filtered data calculate 


Figure 7: Data preprocessing flow chart. 


Establishment of Comprehensive Review Model 
Z-score Evaluation Model 

The raw scores were standardized, the influence of 
dimensions and scales on the scores was eliminated, and the 
scores of different evaluation experts were compared and 
ranked 


S,-S, 
Z,=— 4 (13) 


Where S; denotes the original score, S, denotes the average 
score of the i-th reviewer, G, denotes the standard deviation 
of the i-th reviewer, and Z, denotes the standardized score. 


Then, the standardized scores of each work on each 
reviewer are summed up to obtain the comprehensive score: 


T= 2.24 O49 


Where iv denotes the comprehensive score of the j-th work, 
Z, denotes the standardized score of the i-th reviewer for the 
j-th work, and m denotes the number of reviewers. 


According to the size of the comprehensive score, the 
works are sorted to determine their final ranking: 


R= DIM -n)# as 
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where R, denotes the ranking of the j-th work, T, denotes the 
comprehensive score of the j-th work, I denote the indicator 
function, which takes the value 1 when the condition in the 
parentheses is satisfied, and 0 otherwise, and n denotes the 
total number of works. 


PCA Evaluation Model 

e Data preparation: Data pre-processing obtains a 
raw matrix with a size of mxn, where m represents 
the number of works and n represents the number of 


experts. 
y ae (x, ul x; 
X,= # (16) 
n 

where Xx, denotes the original score of the i-th work by 
the j-th expert. X ' denotes the standard score of the i-th 
work by the j-th expert, and X, denotes the average value 
calculated in the review process. 


Next, calculate the highest and lowest scores of each work in 
the first review: 


xX 


i,max 


xX 


i,min 


=max'_,X,# (17) 


=min'_,X,# (18) 
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Where X, denotes the meaning of the highest review score 
obtained by the i-th work in the review process, X, denotes 
the lowest score of the i-th work in the review process. 


Finally, calculate the range of each work in the first review 
process: 
R, = Ape _ pee #t (19) 
Using the same calculation method, the corresponding 
average and range of the second review process can be 


calculated, and the specific calculation process is consistent 
with the above. 


Principal component analysis: First, calculate the 
standard deviation of each work in the first review original 
score and standard score: 


Where m denotes the total number of works data, X, 
denotes the score value of the i-th team’s work, X denotes 
the average value of all team data, and Koa denotes the 
standard deviation of the work review. 


Next, calculate the standard deviation of the original 
score and the standard score of the second review, and the 
specific calculation process is consistent with the method of 
the first review process. 


Finally, according to the PCA principle [37-40], calculate 
the principal components of the first and second reviews, to 
facilitate the subsequent calculation of the comprehensive 
standard score, and the specific calculation of the principal 
components is: 


n 
PC, = yw, xX,# (21) 
j=l 
Where PC, is the first principal component, and similarly, the 
second review result can be obtained as the second principal 
component for subsequent calculation and evaluation, that 
is, PC, is the first principal component. 


Calculate the comprehensive standard score: 
Score=axPC,+ Bx PC,# (22) 


where Score is the comprehensive standard score, a and B 

are adjustable weight factors. 

e Sorting: By calculating the comprehensive standard 
score, reorder them in descending order to determine 
the final ranking of the scores. 


Bin Z, et al. Biomathematical Modeling in Evaluation Scheme for Large Scale Innovation 


Competition. J Data Sci Artificial Int 2024, 2(1): 000114. 


Open Access Journal of Data Science & Artificial Intelligence 


Fuzzy Evaluation Model 

Based on the principle of fuzzy evaluation scheme, we 
constructed a mathematical model, which contains the 
following main parts: 
Part 1: We regard the evaluation result of each work as a fuzzy 
set, which is formed based on the scores of the evaluation 
experts [41-44]. The score is x, the maximum value of the 
score is M, and the minimum value of the score is m: 


0, x<m 
te, mexcistm 
M-m 2 
L(x) = x# (23) 
ce M+m eeu 
M-m 2 
0, x>M 


The membership function graph of the triangular fuzzy 
number is as follows: 


diyssaquwew jo aaiBap 


m M+m M b's 


Figure 8: Membership function graph of the triangular 
fuzzy number. 


SJ 


If the distribution of scores is asymmetric, we can 
consider using the membership function of trapezoidal fuzzy 
numbers: 


= 
Rs) 
> 


diysuaquiaw 
jo aaiBap 


mM Q1 a3 M xg 


Figure 9: Membership function diagram of trapezoidal 
fuzzy numbers. 


J 
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Let x be the score, M be the maximum score, m be the 
minimum score, Q3 be the upper quartile of scores, and Q1 
be the lower quartile of scores. 


x-m 
<Q1 
Ol=nt m<x<Q 
1 aisx<o3?) 
u(x)= ip x# (24) 
*  Q3<x<M 
M -Q3 
0, x>M 


Part 2: Using fuzzy mathematics [45]. The fuzzy set of the 
work is recorded as A, the weight of the evaluation scores 
given by the evaluation experts is w, and the comprehensive 
fuzzy set is B. 


B=) w,4,x# (25) 
i=l 


Part 3: Multiply the membership degrees of each rating by 
their respective weights, and then add them up to obtain the 
composite membership degree. 


B=||A" x# (26) 
i=l 
Part 4: Sorting Method for Composite Fuzzy Set of Works: 
Using fuzzy mathematics [46], sort each work’s composite 
fuzzy set to determine the work’s ranking. The composite 
fuzzy set of the work is B, and its membership degree 
function is pL, (x). 


max Lp (x) (27) 


diHs (x) (28) 


xeX 

Solution and Analysis of Comprehensive Review Model 

Design of Calculation Methods and Determination of 

Parameters 

e Design of Calculation Methods for Z-Score Evaluation 
Model: The determination of key variables: refers to 
the original rating of the i-th reviewer on the j-th work, 
refers to the standardized rating of the i-th reviewer on 
the j-th work, refers to the comprehensive rating of the 
j-th work, refers to the ranking of the j-th work, m refers 
to the total number of reviewers, n refers to the total 
number of works, refers to the average rating of the i-th 
reviewer, and represents the standard deviation of the 
i-th reviewer. 

e Standardized Rating: 


i 


a 1 n 
Where § =—YS'S., o, = 
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e Comprehensive Rating: Calculate the sum of ratings for 
each work on all reviewers, i.e. 


T= Di ZyX# (80) 
e Ranking: Sort works based on their overall rating, i.e. 
R, = De -T,) # (31) 
Where / represents the indicator function, which takes a value 
of 1 when the condition in parentheses is true, otherwise it 


is 0. 


e =PCA Design of Calculation Methods 


‘omprehensive| 
principal calculation 
standard 


Quasi-score 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
! 
' 
| 
| 
1 
| 
1 
| 
| 
| 
| 
| 


Figure 10: PCA Modeling Flowchart. 


J 


For the Fuzzy Evaluation Model, convert the scores of 
each evaluation expert into fuzzy numbers, establish a fuzzy 
evaluation matrix, determine the weight vector of evaluation 
indicators, calculate the fuzzy evaluation vector, and finally 
sort the works based on the size of the fuzzy numbers. 


For the Z-Score Evaluation Model, the parameters are 

determined as follows: 

e Number of review experts m: m = 8. 

e = The total number of works n: n = 352. (Works entering 
the second round) 

e =©Original score: The original score is an 8 x The matrix of 
352, where each element takes an integer value between 
0 and 100, represents the i-th evaluation expert’s rating 
of the j-th work. Ss; 

e Ranking: Sort works based on their overall rating, i.e. 


R, = DAZ, -T,) # (32) 


Where I is 0 or 1 
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Parameter Symbol : 
Symbols Category Parameter Meaning Value Range 
M nie Total number of review experts m=8 
variable 
N ses Total number of works n=351 
variable 
j bat : : Values for each element range 
i matrix The original rating was an 8 x Matrix of 351 a ata On 
. Integer ; act = 
Ny variable ranking a ae : (7, Tf, 
I 0-1 constant Indicator function 0Or1 
o, eaanie Refers to the standard deviation of the i-th evaluation Greater thamoregial HO 
expert 
T oaHanie Refers to the standardized rating of the j-th work by the T= ~ Z.. 
j i-th evaluation expert J isl Y 


Table 3: Z-Score Evaluation Model Parameter Information Table. 


Parameter : 
symbols Parameter Meaning 
Xin Has the first work been reviewed by the first expert 
Lar Is there an intersection between the collection of works reviewed by the nth and q-th experts 
Z The total number of intersections between the collection of works reviewed by all experts 
C Number of works reviewed by each expert 
D Number of experts reviewed for each work 
X Original data matrix, size, where represents the number of works and represents the number of experts 
x Original score of the first work rated by the first expert 
X H The standard score for the first work rated by the first expert 
X, The average of the original and standard scores for the first review of the work 
ae The highest score for the first review of the first work 
oe Lowest score for the first review of the first work 
R, The first review of the first work was extremely poor 
X The standard deviation between the original score and the standard score for the first evaluation of all 
a works 
PC, The first principal component 
PC, Second principal component 
a 


Weights for the first review 


Weights for the second review 


Table 4: PCA Evaluation Model Parameter Information Table. 


Bin Z, et al. Biomathematical Modeling in Evaluation Scheme for Large Scale Innovation 


Competition. J Data Sci Artificial Int 2024, 2(1): 000114. 


Copyright© Bin Z, et al. 


Open Access Journal of Data Science & Artifi 


cial Intelligence 


Parameter Symbols 


Parameter Meaning 


Value Range 


L (x) Membership degree 0-1 
m Minimum score 0<m<M 
M Maximum score m<M 
x score Greater than 0 
ra) Rating weight of evaluation experts 0-1 
A fuzzy set 0-1 
B Comprehensive fuzzy set B= 2 WA, 


Table 5: Fuzzy Evaluation Model Parameter Information Table. 


Result Analysis 
e Z Score Evaluation Solution: 


Calculate the ranking of each work based on _ the 


comprehensive rating. 


Based on the original rating, calculate the average rating 


and standard deviation for each reviewer, and then calculate 
the standardized rating for each work. 


$,S,0,Z,. (33) 


Calculate the comprehensive rating for each work based on 


TR, x# (35) 

e =©Fuzzy Evaluation Solution 

According to the fuzzy evaluation vector, the evaluation 
objects are sorted according to the size relationship of the 
fuzzy number, and the final order of the evaluation objects 


standardized ratings. 


Z,1, x# 


(34) 


is obtained. 


Bqx# (36) 


Z-score rankings 


Figure 11: Scatter Chart of Traditional Standard Score and Z-score Ranking. 


Scatter plot of standard score rankings and Z-score rankings 


Standard score rankings 
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Scatter plot of standard score rankings and PCA score rankings 


PCA score rankings 


Standard score rankings 


Figure 12: Scatter Chart of Traditional Standard Score and PCA Ranking. 
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Standard score rankings 


Figure 13: Scatter Chart of Traditional Standard Score and Fuzzy Evaluation Ranking. 


S 4 


By calculating the rankings of the Z-score model, PCA scheme has a significantly higher correlation with the 
model, and fuzzy evaluation scheme, and comparing them traditional standard score evaluation scheme compared to 
with the traditional standard score evaluation scheme, as the PCA evaluation scheme and fuzzy evaluation scheme. 


shown in Figures 12,13 and Table 5, the Z-score evaluation 


Comparison Of Evaluation Plans Correlation Coefficient (To Two Decimal Places) 
Traditional Standard Score Scheme and Z-score Evaluation Scheme 0.94 
Traditional standard sub scheme and PCA evaluation scheme 0.86 
Traditional Standard Classification Scheme and Fuzzy Evaluation 0.89 
Scheme 


Table 6: Comparison of Evaluation Schemes. 
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Designing a New Standard Score Calculation Model 

Firstly, we assume that each expert gives an original 
score for each work, representing their subjective evaluation 
of the quality. We use a, to represent the original score given 
by the i-th expert for the j-th work. 


Secondly, we consider the objective evaluation of each 
expert on the quality of the work. We use b, to represent the 
objective score given by the i-th expert for the j-th work. The 
objective score is calculated by dividing the original score by 
the average original score of the expert’s evaluated works 
and then multiplying by the original score, expressed as: 


a 
i E(a,) 


Where E(a,) represents the expected value of the scores 
given by the i-th expert to all works. 


xa, # (37) 


Thirdly, we consider the comparability of each work 
among different experts, which refers to its relative position 
in the works evaluated by different experts. We use C, to 
represent the comparable score given by the i-th expert 
for the j-th work. The comparable score is calculated by 
subtracting the minimum original score of the expert’s 
evaluated works from the original score and then dividing 
by the difference between the highest original score and the 
lowest original score of the expert’s evaluated works, 


Gs; —min(a,) 
a max (a, ) - min(a, ) a ae 


Finally, we comprehensively consider the above three aspects 
to give a standard score for each work under evaluation by 
each expert. We use d, to represent the standard score given 
by the i-th expert for the j-th work: 


a, a, —min(a, ) 
ro z(a,) ae max(a,)-min(a,)- “9 (39) 
: 2 


We analyzed and compared this final score with the actual 
award-winning paper ranking agreed upon by multiple 
experts, and found that the rank correlation difference of this 
new standard score formula was 96.32%. This shows that 
the new standard score formula is reliable. 


Range Model 


To address the issue of extreme variability in large-scale 
innovation competitions [47-49], this section establishes a 


Bin Z, et al. Biomathematical Modeling in Evaluation Scheme for Large Scale Innovation 


Competition. J Data Sci Artificial Int 2024, 2(1): 000114. 


Open Access Journal of Data Science & Artificial Intelligence 


programmatic “range” model. 


Data Preprocessing 

To facilitate data reading and calculation, the first stage 
5 experts rating and the second stage 3 experts rating data in 
data 2.1 and data 2.2 are split into separate table data. 


Establishment of the “Range” Model 

e Determination of key Variables: Ss; denotes the 
original score of the i-th review expert for the j-th work, 
Zz, denotes the standardized score of the i-th review 
expert for the j-th work, R, denotes the range of the j-th 
work, T, denotes the comprehensive score of the j-th 
work, FE denotes the ranking of the j-th work, m denotes 
the total number of review experts, n denotes the total 
number of works, S , denotes the average score of the 
i-th review expert, O, denotes the standard deviation of 
the i-th review expert, a denotes the range threshold 
[50], 8 denotes the comprehensive score threshold. 

e Standardized Score 


S; i 
Z,, =———___ (40) 


UT] a 
a 1 n 1 n ee 2 
Where S,=—)>\" S,, 0, = (aaj Sy ~ 8) 


ns 


e Range: Calculate the difference between the maximum 
and minimum values of the scores of all review experts 
for each work, that is, 


R, = max;,Z, —minj,Z, x# (41) 


e Comprehensive Score [51]: Calculate the sum of the 
scores of all review experts for each work, that is, 


m 


ee 


t 


e Ranking 
Pad A(T 27,)# (42) 


where ‘I’ takes 0 or 1. 

e Classification: According to the range and 
comprehensive score, the works are divided into four 
categories, namely: high score and high range, high score 
and low range, low score and high range, low score and 
low range. 


For the absolute value of the difference between the 


standardized score and the comprehensive score of the 
review expert, that is, 


IZ, -1,| eae (43) 
y J 2 
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It is considered that the score of the review expert 
deviates from the score of other review experts and needs to 
be adjusted. Let the standardized score of the review expert 
approach the comprehensive score by a certain proportion, 
that is, 


* 


Z, = 


Y 


(l-y)Z,+7T,# (44) 


Where Z represents the standardized score after 
adjustment, y represents the adjustment ratio, and can try 
to take 0.5 first. After adjustment, recalculate the range and 
comprehensive score of the work. The adjusted range, that is, 


R, 


t 


_ m r7* +m y7* 
=max,,Z, —min4Z,# (45) 
the adjusted comprehensive score, that is, 


T= QiaZy4 


Solution and Analysis of the Model 

Design of Calculation Methods and Determination of 
Parameters 

Design of Calculation Methods: 

Calculate the average score and standard deviation of 
each reviewer based on the original score, and then calculate 
the standardized score for each work [52-54]. 


(46) 


According to the range and comprehensive rating of the 
works, they are divided into four categories. For works with 
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high scores and large range, the deviation score is adjusted 
by half towards the direction of the comprehensive rating 
according to the adjusted proportion, and the standardized 
score, range and comprehensive rating are obtained. Then 
the works are sorted according to the comprehensive rating 
or ranking. 

Parameter Determination: Number of evaluation 
experts: The number of evaluation experts in the first 
stage is 5, and the number of evaluation experts in the 
second stage is 3, that is, mm,=5, m,=3. 

Total Number of Works: (Attachment Data 2.1) nn=240. 
Original rating: The original rating is an 8240 matrix 
with elements ranging from 0 to 100, which refers to the 
rating of the first work by the first reviewer. 

Threshold of Range: The range threshold is set to 2, 
which means that when the range of the work is greater 
than or equal to 2, it is considered that the range of the 
work is too large and needs to be adjusted. aa=2 
Threshold of Comprehensive Evaluation: The 
threshold of comprehensive evaluation is set to 0, 
which means that when the comprehensive evaluation 
of a work is greater than or equal to 0, it is considered 
that the work has high innovation and belongs to a high 
segment. Otherwise, it belongs to a low segment. BB=0 
Adjustment Ratio: The adjustment ratio is 0.5, which 
means that when the working range is too large, the 
deviation score will be adjusted halfway towards the 
direction of the comprehensive score. yy=0.5. 


Parameter Symbols Parameter Meaning 
F Refers to the original rating of the work by the first reviewer 
Li Refers to the standardized rating of the work by the first reviewer 
R, Refers to the extreme difference in the first work 
T, Refers to the comprehensive rating of the first work 
EF Refers to the ranking of the first work 
M Refers to the total number of review experts 
N Refers to the total number of works 
S Refers to the average score of the first reviewing expert 
i 
« Refers to the standard deviation of the first reviewer 
L 
a Refers to the threshold of the range 
B Refers to the threshold of the comprehensive score 


Table 7: Range Model Parameter Information Table 
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Result Analysis 


Evaluation Scores 
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The Line Chart of the Mean and Range of Two-stage Evaluation Scores 


Figure 14: Line Chart of Mean and Range of Two Stage Scores. 


By adjusting the threshold of the comprehensive rating 
B and adjusted proportions y the model was revised and the 
final work with a “large range” score selected by the model 
had an award rate of over 95%, proving the effectiveness of 
the model. 


Difference Perception Review Model 


This section mainly focuses on the establishment of the 
optimized complete model, as well as data preprocessing, 
the establishment of the difference perception review model, 
solving and analysing the model, and model verification. 


Data Preprocessing 

To better construct the model and train it, we conducted 
statistical analysis on the number of people who won each 
award level and the total number of teams in the original 
data. 


r 


Figure 15: Award Level Distribution (Appendix: Data1). 


S 
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Establishment of the Difference Perception Review 
Model: This section proposes a Difference Perception Review 
Model that can calculate scores and rank entries based on 
input raw data, outputting the final ranking results. The 
specific implementation steps of the Difference Perception 
Review Model [55-57] are as follows: 


Step 1: Input the dataset to be ranked. 

Step 2: For the input data, first calculate the standard 
scores of all first review scores. The specific formula can be 
expressed as: 


Z,,= L# (47) 
0; 
=. Wea 
5, =— Do Sy # (48) 
1 n ay 
o,= — Deal Sy-Si) # (49) 


Let’s assume there are m teams in total and then we 
evaluate the remaining (m-n) teams. We directly evaluate the 
first a teams as third-place winners, and all remaining teams 
as unsuccessful. The parameters m, n, and a are calculated 
during the data statistics section.” 


Step 3: For the teams entering Stage 2 evaluations, we 
first differentiate them based on the range difference scores 
from Stage 2 reviews. If the range difference score is greater 
than a threshold 6 (in our model, 6 = 20), then the overall 
final score is equal to the weighted sum of the average score 
S1 from the first review, the standard scores S2 from the 
second review, and the average score S3 from the original 
scores in Stage 2. The specific process can be expressed using 
a formula: 
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Score=axS1+S2+uxS3 (50) 
Sl1=(A+B+C+D+E)/3 (51) 
S2=PxE+yxF+exG (52) 
S3=(M+N+Q)/3 (53) 


Step 4: In this model, A, B, C, D, and E represent the standard 
scores calculated by the first five experts during the first 
review, H, F, and G represent the standard scores calculated 
by the three experts during Stage 2, M, N, and Q represent 
the original scores given by the three experts during Stage 2. 
a, H, B, y and € are all weight factors, and their sum is equal 
to 1. The specific weight values can be obtained from model 
training. The optimal value for alpha is 0.6, while p, B, y and € 
are all set to 0.1. If the range difference score is greater than 
the threshold 6, the final score is equal to the weighted sum 
of the average score S1 from the first review, the standard 
scores S2 from the second review, and the average score S3 
from the original scores in Stage 2. The specific process can 
be expressed using a formula: 


Score=a@xS1+S4 +uxS5 (54) 
S4=PxH+yvxF+exG (55) 
S5=(J+K+L)/3 (56) 


where J, K, and L represent the review scores given by the 
three experts in Stage 2. 


Step 5: Subsequently, the teams that will undergo the 
second evaluation are arranged in descending order of their 
final scores, and the performance grades are divided based 
on these scores. Each grade is then further divided by the 
number of teams in it. Finally, the data tables for the teams 
that have gone through both the first and second evaluations 
as well as those that only went through the first evaluation 
are combined into one data table, sorted, and outputted. The 
final score sheet for this model is also outputted. 

Step 6: We calculate the correlation between the output 
score sheet and the original data table that was arranged 
correctly. We then plot a heat map to visualize and evaluate 
the degree of correlation between them. 


Solution and Analysis of the Model 
Design of Calculation Methods and Determination of 
Parameters: 

The Difference Perception Review Model mainly includes 
three main parts: data pre-processing, model design, and 
model evaluation. The specific details and ideas of model 
design can be seen in Figure 16. 
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Figure 16: Difference Perception Review Model Modeling 
Flowchart. 


J 


e Parameter Determination: 

In the Difference Perception Review Model, the main 
parameters we need are expert ratings A, B, C, D, E, F, G, H, 
and weight parameters provided in two stages a, u, B, Y, €. 


Para 
meter Parameter Meaning 
Symbols 
N Number of teams entering the second 
evaluation 
M Total participants 
A Number of teams directly rated as third prize 
5 Range scores greater than a threshold 
52 Standard score given by three experts in the 
second stage 
53 The average of the original scores in the second 
stage 
A, B, C, D,E Stage one standard score 
H,EG Second stage standard score 
M, N,Q The second stage works original points 
Qa, W, B, ye Weighting factor 
LKL Review points given by the three experts in the 
— second stage 


Figure 8: Difference Perception Review Model Parameter 
Information Table. 
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Result Analysis 
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Figure 17: Ranking level heat map. 


Figure 17 is a heat map drawn based on the original 
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ranking results and some indicators of the Difference 
Perception Review Model. R1, Al, and S1 represent the 
ranking, awards, and final scores in the original results, 
while R2, A2, and S2 represent the ranking, awards, and 
final scores in the output results of the Difference Perception 
Review Model. Overall, the data after the original ranking 
has a high correlation with the output data of our model, 
indicating the feasibility of the calculation method adopted 
by the Difference Perception Review Model. 


What further data needs to be collected in the future, I believe, 
can include the following aspects: Review the professional 
direction, research background, review experience and 
other information of the experts; more detailed information 
about the author of the work; a list of records for the review 
process; the collection of opinions after the evaluation cycle. 


Model Validation 

The difference perception review model is the complete 
review model [58]. To more comprehensively evaluate the 
performance of this model, this section carried out a model 
test. 


Va 


Box diagram for outlier detection 
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Figure 18: Box plot of outlier detection. 
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The box plot in Figure 18 shows that the distribution of 
data in this graph is relatively uniform, with relatively few 
outliers. As we know, the smaller the MSE, the better the fit of 
the model. The R? value can be used to measure the model’s 
ability to explain variance. The closer R? is to 1, the better. We 
calculated the mean squared error of the model and obtained 
a result of MSE = 0.24 and a decision coefficient of R* = 0.99 
(rounded to two decimal places). Based on these results, we 
can find that the difference perception review model has 
good fitting ability and stability. 


Conclusion 


The main difference between the 0-1 planning model 
and other models is that it maximizes the intersection of 
works among review experts as the optimization objective, 
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while other models usually minimize the difference in works 
among review experts as the optimization objective. The 
comprehensive review model synthesizes the advantages of 
Z-score evaluation models, PCA evaluation models, and fuzzy 
evaluation models [59,60]. The “Range” model [61] considers 
factors such as difficulty, innovation, expert ratings, and 
evaluation criteria for a work comprehensively, compared to 
other models that typically focus only on work ratings and 
differences or expert ratings and tendencies, without fully 
considering the interrelationships and impacts between 
works and experts.Improvement directions for these models 
include considering data mining or data analysis methods; 
considering multi-objective optimization or multi-criteria 
decision-making methods to comprehensively consider 
the intersection and difference of works among review 
experts and their rating levels and evaluation standards 
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[62]; considering fuzzy mathematics or uncertainty theory 
methods to handle rating uncertainty and fuzziness. 

The given text discusses the potential applications 
and benefits of a proposed judging scheme for large-scale 
innovation contests. It suggests that this scheme could lead to 
more equitable and accurate results in various assessments 
and evaluations, such as art tests, multi-level exams, and 
public official elections. The main advantages highlighted are 
the enhancement of impartiality, consistency, and objectivity. 


The text also acknowledges that organizing large- 
scale innovation competitions is a complex task influenced 
by many interrelated factors. The focus of the paper is on 
minimizing subjective factors that affect the scoring of 
entries, with solutions specifically aimed at resolving issues 
related to controversial entries. 


To draw broader conclusions and plan future works, the 
following steps could be considered: 

e Expand the Scope of the Study: Investigate the 
applicability of the judging scheme to different types of 
contests and assessments beyond innovation contests. 

e Factor Analysis: Conduct further research into 
additional features that may influence the outcome 
of competitions. This could include the impact of 
contest design, participant demographics, or criteria 
transparency. 

e Improve the Judging Scheme: Continuously refine 
the judging scheme based on feedback and data from 
implemented contests to improve its effectiveness and 
fairness. 

e Quantitative Evaluation: Perform quantitative 
analyses to measure the scheme’s success in enhancing 
impartiality, consistency, and objectivity. 

e Qualitative Feedback: Gather qualitative feedback from 
participants, judges, and stakeholders to understand 
their perspectives on the fairness and transparency of 
the process. 

e Case Studies: Develop case studies to illustrate the 
effectiveness of the judging scheme in various contexts, 
which can provide concrete examples for future 
organizers. 

e Technological Integration: Explore how technology 
can be integrated into the judging scheme to streamline 
the process and potentially reduce bias. 

e International Perspectives: Consider the cultural 
differences that might affect the implementation and 
acceptance of the judging scheme in international 
contests. 

e Ethical Considerations: Address any ethical 
implications that may arise from the implementation of 
the judging scheme, ensuring that it upholds principles 
of fairness and justice. 

e Longitudinal Studies: Conduct longitudinal studies to 
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assess the long-term effects of the judging scheme on the 
quality and reputation of competitions. 


The proposed judging scheme for large-scale innovation 
contests in this study offers a valuable framework 
that can be extended to various other assessment and 
evaluation scenarios, including art tests, exams, and 
public official elections. Its application promises greater 
equity, impartiality, consistency, and objectivity in the 
evaluation process. However, it is important to recognize 
that implementing and designing such competitions involve 
numerous interconnected factors. 


While this study effectively addresses subjective biases 
in scoring entries and provides solutions for handling 
controversial submissions, there is a need for further 
exploration of additional factors that may influence the 
overall competition quality and sustainability. Future 
research should aim to identify and analyse these factors 
to develop a more comprehensive understanding of the 
complexities involved. 


Moreover, the continuous refinement and improvement 
of the judging scheme are crucial to ensure its relevance 
and effectiveness in evolving contexts. Future works could 
focus on enhancing the scheme’s adaptability, scalability, 
and responsiveness to changing needs and challenges. 
By doing so, the proposed judging scheme can serve as a 
robust foundation for ensuring fair, accurate, and reliable 
assessments in large-scale innovation contests and beyond. 


By pursuing these future works, researchers can not 
only validate and improve the proposed judging scheme but 
also contribute to the broader field of contest design and 
evaluation methodology. 
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